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Abstract 



Many of the current round of experiments searching for anisotropics in the Microwave Back- 



ed 

ground Radiation (MBR) are confronting the problem of how to disentangle the cosmic signal from 
contamination due to galactic and intergalactic foreground sources. Here we show how commonly 

O 1 

£j ; used likelihood function techniques can be generalized to account for foreground. Specifically we 
c3 ■ set some restrictions on the spectrum of foreground contamination but allow the amplitude to vary 
m ^ ■ arbitrarily. The likelihood function thus generalized gives reasonable limits on the MBR anisotropy 
which, in some cases, are not much less restrictive than what one would get from more detailed 
- - modeling of the foreground. Furthermore, the likelihood function is exactly the same as one would 
obtain by simply projecting out foreground contamination and just looking at the reduced data 
set. We apply this generalized analysis to the recent medium angle data sets of ACME-HEMT 
(Gaier et al. 1992, Schuster et al. 1993) and MAX (Meinhold et al. 1993, Gunderson et al. 1993). 
The resulting analysis constrains the one free parameter in the standard cold dark matter theory to 
be Qrms-ps = 18^5/ui^. This best fit value, although in striking agreement with the normalization 
from COBE, is not a very good fit, with an overall x 2 1 degree of freedom = 208/168. We also 
argue against three commonly used methods of dealing with foreground: (i) ignoring it completely; 
(ii) subtracting off a best fit foreground and treating the residuals as if uncontaminated; and (iii) 
culling data which appears to be contaminated by foreground. 
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1. Introduction 



Since the detection of Microwave Background Radiation (MBR) anisotropies by the Dif- 
ferential Microwave Radiometer (DMR) on the COsmic Background Explorer (COBE) satellite 
(Smoot et al. 1992), there has been a spate of announcements of detections of anisotropies in mi- 
crowave brightness on smaller angular scales (Gaier et al. 1993, Meinhold et al. 1993, Schuster et 
al. 1993, Gunderson et al. 1993, and more). Many of these experiments utilize measurements 
at multiple frequencies in order to be able distinguish primordial anisotropies in the MBR from 
other types of microwave emission, such as dust, free-free, or synchrotron, which may occur either 
in our Galaxy or in extra- Galactic objects. While the large scale anisotropies observed by the 
COBE-DMR appear to have relatively little foreground contamination, the spectrum of many of 
the small scale anisotropies found are not very well fit by frequency-independent brightness tem- 
perature fluctuations and are therefore probably not completely due to primordial anisotropy. As 
stressed by Brandt et al. , (1993) it will take more and better observations to be able to disentan- 
gle the primordial anisotropies from the foreground contamination. In the meantime one would 
like to use the small-scale measurements to set limits on the parameters of models of cosmological 
inhomogeneities. To obtain such limits one must take into account the uncertain contamination of 
the measured anisotropies. In this paper we will discuss some methods of accounting for the con- 
tamination when constraining parameters and apply them to some of the ACME-HEMT (Gaier et 
al. 1992, Schuster et al. 1993) and MAX (Meinhold et al. 1993, Gunderson et al. 1993) data. We 
stress that taking into account unknown amounts of contamination involves great uncertainties and 
different approaches can be expected to yield different limits on parameters. Here the emphasis 
will be on setting reliable, and perhaps conservative, limits on model parameters. 

In §2 we introduce the likelihood function which is the standard tool used to compare 
theoretical predictions with MBR anisotropy data. We then go on to discuss various properties of 
the generalization of the likelihood function to include a known statistical distribution of sources. 
We show that this generalized likelihood function has a well-defined and useful limit when we 
take the amplitude of the foreground anisotropies to be large. This limit is independent of the 
spatial correlations of the foreground emission and depends only on the assumed spectra of the 
various components of foreground contamination. This limit is also equivalent to "projecting out" 
or "marginalizing" (Anthony Lasenby's terminology) the foreground emission from multifrequency 
data. Finally in §2 we show how this limit may be easily taken when the MBR model and detector 
noise are assumed Gaussian. In §3 we give some examples of how the procedure behaves by applying 
it to the data of Gaier et al. (1993). In §4 we apply it to rest of the data. Finally in §5 we discuss 
the meaning of the results obtained. 
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2. Likelihood Functions 



There are at least three components to microwave temperature anisotropies measured on the 
sky. For cosmologists the most interesting component is the set of primordial MBR anisotropies. 
These are caused by the initial inhomogeneities in the Universe which are thought to be set up by 
some random process occurring at very early cosmological times. The particular random process 
we will refer to as a cosmological model. Here we will suppose one is considering only a subset of 
possible models, parameterized by a few numbers. Let us denote these numbers by the shorthand 
V, for parameters. What we would like to do is estimate V from a set of MBR anisotropy mea- 
surements. Let T represent the true values of the primordial MBR anisotropies that have been 
estimated by experiments. Given T we can estimate V using the probability density of T given 
V, i.e. p 1 (T\V) dT. In addition to the primordial MBR anisotropy there is also a component of 
foreground contamination which adds to the signal coming into an MBR experiment, or symboli- 
cally S = T + JF, where S and JF refer to external signal and foreground contamination. The third 
ingredient is the observational error or detector noise, A/", in measuring brightness fluctuations. 
Thus the data we obtain, T>, can be written a,sV = S + Af = T + J-' + M . By testing and good 
experimental design we usually know the distribution of M or mathematically p 2 (T>\S) dT> which 
is the probability density of V given S. Unfortunately we do not know much about the properties 
of T and this limits what we can say about V . 

For the moment assume T is zero, as is done in many analyses of MBR experiments. Then 
one has all of the ingredients to construct the probability density of V conditional on T, i.e. 



One approach to inferring V from V is to choose V such that for that value of V the observed data, 
T>, is near [but not too near] the mean of the distribution. Along the same lines one might require 
that V be chosen such that T> has a high probability density, when compared with other possible 
values of T>. These methods may be described as requiring "goodness-of-fit" of the data and involve 
fixing V and looking at the probability distribution of T>. Of course what we would really like is 
the probability distribution of V. Unfortunately there is nothing in probability theory that would 
allow us to convert the data into such a probability distribution without further assumptions. 

Another way to infer V from V involves not the probability of V, but its likelihood. That is, 
fix the data at the observed value, V, and choose V such that the probability density, p 3 (V\V) : be 
large when compared to other values of V . Clearly a large probability density near the observed 
data favors such a value of V. Another motivation for this procedure stems from Bayes' Theo- 
rem. Let us suppose we had made the further assumption (or had further knowledge) that the 
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parameters, V, were determined by some random process, with probability distribution p 4 (V) dV. 
We wish to evaluate the V used to generate the data, V. Bayes' theorem says that given that 
the measurements yield V, the new (posterior) probability distribution for V is just the original 
(prior) distribution times the likelihood function, L, or mathematically 

MVW) A V = L(V, V) P ,(V) iV L(V, V) . Jp J^J p)dp • (2) 

Note that the ratio of L for different values of V, which is really just the ratio of p 3 (V\V), does 
not depend on the prior distribution. It is this likelihood ratio which gives the relative increase or 
decrease of the probability of different values of V and this is why one might consider values of V 
with relatively high likelihood as being preferred. Even if one doesn't have any knowledge of the 
prior distribution of V one can still use the likelihood ratio as a statistic to choose V, although 
one then cannot determine the probability of the favored choice being correct. 

In recent years the likelihood function has been commonly used as a statistic in analyzing 
anisotropy experiments (see Readhead et al. (1989) or Myers (1990) for clear discussions), although 
usually without regard to any possible foreground contamination. If contamination is present then 
one must make some assumptions about the contamination in order to construct the likelihood 
function. If one knows or guesses that the foreground is drawn from some probability distribution, 
p 6 (T) d? ', then 



L(V;V)<xp 3 (V\V) = J Jp 2 (V\T + f) Pl (T\V) p 6 (T) dT dT 



(3) 



where we have dropped the P-independent denominator of L. One could use one's general knowl- 
edge of the Galactic dust and gas and of extra- Galactic sources to construct a reasonable p 6 . If 
available one should make use of measurements of the emission at other frequencies to constrain the 
foreground emission in the same direction one is measuring the microwave brightness anisotropy. 
Some radio and infrared data are available for the measurements considered below and we will 
give some discussion of these in §5. For the moment we will consider the case where we do not 
have any compelling reason to chose one Pq(F) over another. We now discuss how one can still set 
believable limits on model parameters even in this state of ignorance. 

Marginalizat ion 

If we really do not have much idea of what the foreground is doing then to set reliable 
(= conservative) limits on MBR anisotropy we should take a liberal view of what the foreground 
emission may do to contaminate the measurements. One might think that such an approach will 
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lead to no limits at all, but this is not the case. If one sets some restrictions on the spectrum of 
foreground contamination but allows the amplitude to vary arbitrarily then the likelihood function 
gives reasonable limits on the MBR anisotropy which, in some cases, are not much less restrictive 
than what one would get from more detailed modeling of the foreground. Furthermore, in this 
limit the likelihood function is exactly the same as one would obtain by simply projecting out 
foreground contamination and just looking at the reduced data set. 

Before proceeding to justify these claims we should be a bit more explicit about the mean- 
ing of V, T, and T . The brightness pattern on the sky, I u (n), is a continuous function of both 
direction, n, and frequency, v. It is the v dependence which we can use to unambiguously separate 
MBR anisotropy from foreground contamination. We therefore only consider multifrequency ex- 
periments. The experiments only give estimates of the temperature convolved with some window 
function in direction and frequency, i.e. 

S=^J W t (n)V (l , a) (v)I„(h)d 2 iidv, a = l,...,N ch , z=l,...,iV p | . (4) 

where N p gives the number of spatial patches and iV c h gives the number of spectral channels which 
for simplicity we have assumed is the same for all patches. For simplicity we have also assumed 
that the window functions factorize into spatial (Wj(n)) and spectral (V (iia) {v)) parts and that 
the spatial window is the same for each channel at that patch. In general we only require that 
the window function averaged over frequency and weighted by the spectra of any component, i.e. 
MBR or foreground, be the same for all components and all channels. With this assumption the 
relation between the signal in different channels of the same patch is telling us only about the 
spectrum and the detector noise, and not about the spatial pattern of brightness. The moments 
of both the MBR and foreground brightness which contribute to the measured signal are just the 
same convolution as given in eq. (4). The MBR brightness in a given direction is given by only 
one parameter, the temperature T MBR (n). We will assume that there are only a finite number of 
components of contamination, say iVf, each of which have a known spectrum. Furthermore we 
assume that iVf < N^. If not then any possible observed signal could always be produced by some 
combination of foreground emission and no MBR anisotropy. Finally T> is just S of eq. (4) with 
some added detector noise which is different in each patch and channel. 

Let us count the number of degrees of freedom (dof) of the various terms which contribute 
to the N p x N ch observations, V: T has N p dof, T has N p x N { dof, and M has N p x N ch dof. This 
means that there are (iV c h — iVf ) xAf p dof of V which are linearly independent of T . Thus even if 
we let T span its entire range the resultant V does not span the entire space of observations, and 
this why a liberal attitude toward T still yields interesting results. 
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To implement our liberal approach toward T consider the class of probability distributions 

for T 

p 6 (J 7 )dJ : ' = a f§{aT)dT . (5) 

Any normalizable distribution must fall off for large J 7 , but as a — > the region of significant 
probability density gets larger and larger and the variance of this prior distribution increases. 
However the likelihood function of eq. (3) will remain well behaved since for fixed T>, p 2 (T>\T + T) 
will fall off. Note that T and T cannot exactly cancel since they have different spectra. Thus in 
the limit of large variance we may replace p^F) with p 6 (0), and since this constant is independent 
of V it will not enter into any likelihood ratio and we may drop it. Of course we require that 
p 6 (0) 0, however this is true of any reasonable distribution. For large variance we have 

L{V;V,T) ^L*(V;V) oc J J p 2 {V\T + T) p 1 (T\V) dT dT . (6) 

Thus we see that this likelihood function is just what we would get for a uniform prior for J 7 . Such 
a uniform prior may include correlations between the different patches or correlations between 
different components of a multi-component foreground and the determinant of this correlation 
matrix may enter into p 6 (0), however this will not effect the likelihood ratio of eq. (6). Thus we 
are lead to our first interesting result: 

• In the limit of large variance all prior distributions for the foreground contamination yield 
the same likelihood function. 

We may simplify Eq. (6) still further if we make the assumption that the probability distribution 
of detector noise does not depend on the amplitude of the signal, i.e. 

p,(V\T + F) = U{V-T-F) . (7) 

To see how this helps let us denote Vs dof which are independent of T by V ind and the dependent 
dof by V dep . The MBR contribution to the signal may be similarly decomposed, but the foreground 
of course has no independent part. We may thus rewrite Eq. (7) as 

p 2 (V\T + f) = f 2 (V ind - T ind , V dep - T dep - T) . (8) 

If we substitute this into eq. (6) and change one of the variables of integration from T to V dep we 
find 

Pl (T\V)dT ocL(V;V ind ) . (9) 

since the term in square brackets is just the marginal distribution of £> md obtained by integrating 
out V dep . This gives us our second interesting result 



L*(V;V) oc 



J p 2 (V\T + F)dV dep 
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• The likelihood function constructed by using all of the data and a prior distribution for 
the foreground contamination with very large variance is equal to the likelihood function 
obtained by considering only that linear combination of the data which is independent of 
the foreground contamination. 

One can obtain X> ind by projecting out all linear combinations of the spectrum of the different fore- 
ground components. This last identity is of great practical use since it reduces the dimensionality 
of the data space and can make computations of the likelihood function much more tractable. Of 
course this idea of projecting out the foreground contamination is not a new one. For example, it 
has been used in constructing the "reduced galaxy" (RG) anisotropy map of the DMR experiment. 
However the above equality does give an added justification for constructing such RG data sets, 
as it shows that by doing so one is not throwing out any information except for ones assumptions 
about the properties of the foreground contamination. 

The likelihood function L* is obtained by integrating over arbitrarily large foreground con- 
tamination, which is certainly not true. Given some true distribution of foreground contamination, 
p 6 (^ r ), under what circumstances will L* give a good approximation to the true likelihood function? 
As mentioned before, p 2 in Eq. (3) will regulate the integral over T if p 6 does not. Since p 2 will 
start falling roughly when the foreground emission exceeds either the detector noise or the observed 
signal it follows that the condition for L* to be an accurate representation of L is roughly that 
the variance in the foreground emission exceeds greatly either the detector noise or the observed 
signal. 

In obtaining these results we have made no assumptions about the statistical properties 
of the MBR anisotropy or the foreground contamination, although one will have to make some 
assumptions about the former in order to compute L*. In particular we haven't assumed anything 
is Gaussian. We have assumed that the detector noise is independent of the amplitude of the signal, 
however we think this likely to be a very good approximation. More important assumptions were 
made about the experimental apparatus, in particular we have assumed that the effective window 
function is the same for all channels of a given patch. For some experiments, such as the DMR, this 
is an excellent approximation, while for other such as the ACME experiment it is not. The degree to 
which this varying window can confuse spatial and spectral dependencies will depend on the spatial 
distributions of the emission. One can alleviate this problem if the region of the sky is oversampled 
by the experiment, in which case one can bin the data into synthetic beam patterns which are the 
same for all channels. Finally we have assumed that we know the spectrum of the various foreground 
contaminants. Here it is important only that deviations from the assumed spectra are not sufficient 
that the £> md used in the likelihood function could be significantly contaminated by foreground 
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emission. Here we are making some assumption about the amplitude of foreground emission, but if 
our assumed spectrum is fairly accurate then this is a much less stringent constraint than assuming 
that the foreground emission is small compared with the observed signal. In the microwave region 
there is fairly small uncertainty in the spectra of free-free emission, however for synchrotron and 
dust emission this is not the case. Hopefully data from the DIRBE and FIRAS experiments on 
COBE will show a fairly universal spectra for dust emission at microwave frequencies. We do know 
that at longer wavelengths synchrotron emission exhibits an unfortunately broad range of spectral 
indices and this can lead to large uncertainty in constructing £> md . 

Gaussian Statistics 

Now we can apply this analysis specifically to the case where the MBR anisotropies are 
Gaussian random noise with some unknown parameters and the detector noise is also Gaussian. 
Let us represent the data, V, by a set of numbers A ((M) which give the signal in channel a of patch 
i. We may similarly represent the contribution of the foreground contamination to each such patch 
and channel as A f . Thus the MBR anisotropy plus detector noise, T+jV, is given by A (a i) — Aj^ i} , 
which has the correlation matrix 

((A(a,o - A^ i) )(A (biJ - ) - A[ M) )) = C (o>i)(b>J - ) C(o,»)(i.,j) = Cij hT + vf a ,i)dijdab (10) 

where <7 (o i) gives the instrumental noise and C™ br gives the expected correlation in MBR fluctua- 
tions. We will assume that we have some model which determines Cff° r modulo the value of some 
parameters which are the V of the previous discussion. The MBR anisotropy contribution to the 
signal and the instrumental noise are assumed Gaussianly distributed with zero mean and are thus 
fully determined by their respective correlation matrices, C™ br and o 2 (a i) bijb a \,. The foreground 
emission may be written as a sum over the different foreground emission processes (e.g. free-free, 
synchrotron, dust), i.e. 

where a labels the process, A [a>i] gives the amplitude of emission of process a in beam i, and F {a a} 
gives the contribution per unit amplitude of process a in channel a. The quantities q, _ a } are 
assumed known, however the amplitudes A (a:i) are not. It is these A [a i] 's which represent the T 
in the previous subsection. 

One can explicitly compute the components of A (a i) which are independent of the foreground 
by solving the system of equations 

^^ M = a = l,...,iV f , (12) 

o=l 
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say with a general purpose eigensystem solver. As long as one is able to distinguish the different 
components of foreground contamination, i.e. as long as F {a> a} are linearly independent, there will 
be an (iV c h — iVf)-dimensional space of solutions of eq. (12). If one chooses a basis for this space, 
z ( j\ where r labels the basis vectors, then one obtains a set of coordinates on the data subspace 
which are independent of the foreground: 

A iS, = E^ )A S>- ( 13 ) 

a=l 

The correlation function on T> ind is 

= = ^2^2 Z a )C <.a,i)(b,j)Z ( b S) (14) 

a=l 6=1 

so the likelihood functions of eq. (9) are 

L*(P; V) oc L(V; V ind ) oc ^gjzMjd) (15) 
where || || indicates the determinant and 

N P N p Nch - Nt N ch - Nf 

xL = EE E E . (i6) 

i=l j=l r=l s=l 

Of course, x? nd is just the chi-square statistic which for the correct choice of C|"f ||sj| will be 
distributed like a x 2 -distribution with iVp x (A^ ch — iVf) dof. 

3. Example: South Pole 91 

In this section, we will flesh out the formalism we've set up with an example. We will restrict 
our analysis to testing one particular theory, cold dark matter (CDM) with a Harrison- Zel'dovich 
initial spectrum. We'll focus on an experiment that's been analyzed a number of times already, so 
at least the first part of our discussion, which assumes there are no foreground sources, should be 
familiar to many. This experiment, which we refer to as SP91, was performed at the South Pole 
in 1991 with the ACME-HEMT telescope, and the results were published in Gaier et al. (1992). 

One Patch; One Frequency Channel 

We start by considering the simplest possibility: An experiment measures the temperature 
difference in one region of the sky at one frequency. This one measurement, call it A, is thus the 
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full set of our data V. The first thing we need to know is what is the temperature distribution 
- the full set T is now simply one number T - predicted by the theory, pi(T|<5rms-ps)- The one 
parameter in the theory [the set previously denoted V] is the normalization, which we'll take as 
Qrms-ps, the average quadrupole over an ensemble of universes. Since perturbations are Gaussian 
in standard cold dark matter, we can write 

Pi(T|Q rms _ ps ) = -^L= exp{-i^}. (17) 

The width of the Gaussian here, (7 mbr 7 depends on both the theory and the experimental configu- 
ration of the beam. In particular, 

°° 2/ 4- 1 

c mbr = J2 ~^r CiWi ( 18 ) 

1=2 

where Ci is the coefficient of the Legendre polynomial Pi(h\ ■ fi2) when C(h\ • h-z) = {T(hi)T(fi2)) 
is expanded in a series of such polynomials. That is, the C;'s are given by the theory. The theory's 
one free parameter, Q rms - ps , is related to C% via: C 2 = 4ttQ 2 /5. Meanwhile, the window function 
Wi is solely determined by the experimental beam size and chopping strategy. For the SP91 
experiment (Bond et al. 1991) 

W l = exp {-1(1+ 1)91} E tfoV<^)lf m (0 2 ,O) (19) 

m=— I 

where 9 Z = 27.75°; <pA = 1-5°/ sin(^ 2 ); 6 S = 0.425 x 1.35° [for the highest frequency channel 
we are discussing at present]; and Hq is the Struve function of order zero. The C/'s for CDM 
and Wi for SP91 are plotted in Figure 1. Once the Ci and W\ are given, it is straightforward to 
combine them and compute the expected variance (7 mbr . For standard CDM and SP91, we find 
C mbr = (43Q rms _ ps /17) 2 . [Recall that COBE-normalized CDM has <2 rms - P s = 17^/K.] 

The next step is to account for experimental errors by calculating p 2 (A\T). We assume the 
errors are Gaussian, so that 

P 2 (A|T) = 7 =J— ex p{4^^|- (2°) 

Here, (j exp is the variance of measurements in the 'lab' where there are no other sources [cosmic 
or otherwise] contributing to the signal. So, eq. (20) simply tells us that if cr exp were very small, 
the observed value A would be very close to the actual value on the sky, T. On the other hand, 
if the noise is significant, the observed value could differ significantly from the sky value. One of 
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FIG. 1. The window function Wi for the SP91 experiment and the Cfe in CDM with h = 0.5; fi B = 0.06; 71=1. 
Plotted is 1(1 + l)C;/6 which is equal to one if only the Sachs-Wolfe effect is considered. The dashed line is an 
approximation to the SP91 filter function [assuming square well chopping] which is seen to be off by as much as 
30%. Also shown is the filter function for the MAX experiments to be analyzed in the next section. 

the exciting things about the SP91 experiment is that a exp was of order 20 — 30/uK, significantly 
below the signal predicted by CDM. 

Now that pi and P2 are given, we can convolve the two as required by eq. (18) to form 
£>3(A|(3rms-ps)- The integral over T is readily performed and we find 



f>s(A|Q 



rms — ps 



) = 



1 



icir 1 / 2 



exp 



-AC _1 A 
2 



(21) 



(2tt)^/ 2 ' 

where N is the number of measurements [here just one] and the correlation "matrix" [in this case 
one by one] 

C^(\T + ^\ 2 ) = C^ + al p . (22) 
Figure 2 shows ps as a function of the observed temperature A for several values of the normal- 
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ization Q rms -ps- Here is a good time to hearken back to our discussion following eq. (1). There 
we argued that there were two classes of ways one might constrain the parameters in a theory. 
Let us illustrate these two classes with the aid of Figure 2. First, we could require Qrms-ps to be 
such that the probability density of the observed value of A is reasonably high. For example, if A 
were /uK, we might allow the parameters Q r ms- P s = 10/iK and Q r ms- P s = 20/uK, but frown on 
Qrms-ps = 30/uK, because the probability density of A = /iK is unacceptably low. If, however, A 
was observed to be 200/uK we would rule out all three values of Qrms-ps, because the probability 
density is too low in each case. Now consider the second method: the likelihood approach. In this 
approach, we compare the probability density of A for different values of Q ruis - ps and throw out 
values of Q r ms- P s which have likelihoods significantly smaller than the "best" values of Q r ms- P s- In 
our artificial example here, with only three values of Qrms-ps, this means that if A were 200/uK, we 
would throw out Qrms-ps = 10, 20/uK, but keep Q rms - P s = 30/uK since this is the value of Q rms - P s 
at which the likelihood [for the observed A] is maximum. There is something unsatisfactory about 
this: We are accepting a value of the parameter which gives the best fit, but it is not a particularly 
good fit. So although we will follow the second approach to find the best fit Q rms -p S , we will use 
the first approach to "check" the goodness of this best fit. 

Many Patches; One Frequency Channel 

The nine point scan of SP91 reported temperature differences for nine patches on the sky. 
The discussion above is easily generalized to the multi-patch case. The data T> which before was 
a single measurement A is now a series of measurements, Aj, i = 1, . . . A^ patc h = 9. We have seen 
that all we need to calculate [or the likelihood function] is the correlation matrix, C, which now 
is 9 x 9. The experimental errors are assumed independent so 

Cij =< TiTj > +<Ji, J -o£ cP)i . (23) 

The off-diagonal elements of the theoretical correlation matrix < TiTj > are given by the same 
sum in eq. (18), with 

Wi,ij =exp {-1(1 + 1)# S 2 }^T E cos(m(^-^))if 2 (m^)y, 2 m (^,0) (24) 

m= — l 

where (pi is the azimuthal angle corresponding to the center of the i th patch [the polar angle 9 Z is 
kept constant throughout the scan]. 

There is one further complication that must be dealt with before we can show the likelihood 
function. The experimenters subtract from each scan an "offset" and a "drift." That is, they 
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FIG. 2. The probability distribution for the observed temperature in a single patch of the SP91 experiment, 
P3(A|(5r ms — p S ), for three different values of the CDM normalization, Q TU1S — ps - The distribution, which is the 
same for —A, is normalized so its integral over all A is one. 

subtract from each scan the best fit line. So the temperature differences they are actually reporting 
are 

A; = Ai-imti + b). (25) 

There are several ways of accounting for this. Bond etal. (1991) assumed a uniform prior for the 
average and gradient and integrated them out. Another possibility is to note that the 8 th and 9 th 
patch are really redundant, since they are fixed by the requirement that the mean and gradient 
vanish. Thus we could simply project onto the seven-dimensional space of measurements. All of 
this has a familiar ring to it: these are precisely the two alternatives we found to be equivalent 
when we talked about foreground. So as a warmup exercise to subtracting off foreground, let's 
apply the formalism we set up in section 2 to subtract off the mean [we won't worry about the 
gradient in this discussion, although we do subtract it off to get our final results]. Analogous to 
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eq. (12) we want the final temperature set to have zero mean. If all the errors were the same, this 



would mean Yli^i Zi = 0. Putting in the weighting factors leads to 



E = 0- (26) 



i=l °exp,i 

This equation is satisfied for eight linearly independent zf\ r = 1, . . . , 8. Once we find these eight 
vectors, we can then form the mean-subtracted temperatures 

9 



A; = 5> W A, (27) 



To calculate p3, we can still use eq. (21), but now the correlation matrix that enters is the reduced 
correlation matrix 

c rs ee <a;a;> = 4 r) 4 8) Cij. (28) 

If we assume a prior distribution function, p^(Q rms _ ps ), we can convert ^3 into a posterior 
distribution for Q rms -p S , Ps^i-ms-psl^ )> using eq. (2). (We see from eq. (2) that the functional 
dependence of the likelihood function on the model parameter is the same as the posterior prob- 
ability density for the parameters in a Bayesian analysis if the assumed prior is uniform in the 
parameters. Since others (e.g. Srednicki et al. 1993) have put a Bayesian interpretation on their 
analysis we will find it convenient to discuss various prior distributions, and use the resultant 
posterior probability densities to set limits on Q rms -ps. However one should remember that when 
we consider a uniform prior, we might just as well be doing a likelihood analysis as a Bayesian 
analysis. The names are changed but they are mathematically equivalent.) Figure 3 shows this 
posterior distribution function for the highest frequency (4th) channel of the SP91 experiment, in 
the case of three different priors. For the 4th channel SP91 data, no matter which prior is used , 
P5 falls off fairly quickly, which led a number of groups to place fairly stringent upper limits. The 
exact limit does depend sensitively on the prior though. If we define an upper limit via 



/-.upper limit 
^ rms — ps 



dQp 5 (Q) = 0.95, (29) 



then the upper limits are Q^s-pl™ 11 = 9 > H 20pK for £»4(<2rms- P s) = 1/Qrms- P s, 1, <2rms- P s, 
respectively. One way to assess the upper limit associated with a given prior is to calculate the 
level of significance of the test. This tells us how often we would rule out Qrms-ps mit ^ tlmt. was the 
true value. For example, when p^iQrms-ps) = 1 leading to an upper limit of Qrrns-ps™ 1 * = 14/xK, 
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FIG. 3. The posterior distribution for Qrms— ps using the highest frequency channel of the SP91 experiment. The 
three different curves correspond to three different choices of priors: P4 = 1/ Q TU1S — ps , 1, Qrms— ps- The first such 
prior has been cut off at Qrms— ps = 2/iK to make it normalizable. 

we can ask: If the Universe truly had Qrms-ps = 14/iK, and many different experiments were done, 
how often would someone using this test rule out Qrms-ps = 14/xK? The levels of significance of 
the tests with the priors P4 = l/Q Tms - ps , 1, Q rms -ps are .11, .02, .002, respectively. Thus a prior 
which favors high values of Q rms -ps leads to very low levels of significance; i.e. it leads to upper 
limits which are too stringent. For this reason, p4 = 1 is often chosen, and we will stick to this 
choice for the rest of our discussion. 

Many Patches; Many Frequency Channels 

We now account for the fact that SP91, like many modern anisotropy experiments, took 
measurements at a number of different frequency channels. In the SP91 experiments there were 
four frequency channels, spanning the range 25 — 35 GHz. Thus, the data set V now consists 
of A( a i ), a = 1, . . . , 4; i = 1, . . . , 7 [recall that the mean and gradient are subtracted out of each 
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channel]. To construct the posterior distribution p 5 [from now on we'll use P4 = 1], we must 
therefore form the correlation matrix 

C (a,i)(b,j) = (T a ,iT b j) + <Jy<WLp,( a ,») • ( 30 ) 

Ordinarily, the expected cosmic signal would be the same in each channel, since the temperature 
differences are frequency independent. In SP91, there is a small complication owing to the different 
widths of the beams in the different channels. We have accounted for this by allowing 9 S in the 
filter function of eq. (19) to be channel dependent. 

Figure 4 shows the posterior distribution for Q rms - ps given the 4-channel data in the SP91 
experiment. Taken at face value, this data clearly seems to indicate a detection, i.e. Q r ms- P s = is 
ruled out. Applying the test in eq. (29) to determine an upper limit and a similar one to determine 
the lower limit, i.e. 

/•OO 

/ dQp 5 (Q) = 0.05, (31) 

^y/nlowcr limit 
^ rms — ps 

we find 

Qrms-ps = lO^VK (32) 

at the 95% confidence level. Eq. (32) tells us that the best fit for this experiment is at about 
Qrms-ps = 10/uK. Is this "best fit" a good fit? When there was only one data point we could 
simply look at the distribution p 3 and see whether or not the distribution function was acceptably 
high at the actual values observed. Unfortunately, it is harder to do this in a 28 dimensional space. 
Instead, a good number to look at for these purposes is the x 2 , defined in eq. (16). The observed 
X 2 should be of order the number of degrees of freedom, in SP91 (4 channels) x (7 patches) — 
(1 normalization parameter) = 27, with a standard deviation of \J2 x A^of = 7. However, in this 
case the observed x 2 = 46 for the four-channel data. This tells us that our best fit is not a very 
good fit at all, and we better look at a different theory or at least at the possibility of other sources. 
We turn next to this latter possibility. 

One Source of Foreground: Marginalization 

We now allow the possibility of foreground sources. The observed signal then is the sum of 
detector noise, foreground sources, and cosmic signal. As in Eq. (3), the distribution function for 
the data is 

P3(V\Q rms . ps ) = {2 ^ {N/2) \\C\\- 1/2 f dT peCF) exp j-±(2> - T)C~\V - T) } . (33) 
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4 Channel- 9 Point Scan (SP92) 
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FIG. 4. The posterior distribution for Qrms— ps from the four channel, nine patch scan of SP91. The solid line 
assumes no foreground contribution. The dashed line assumes a best fit free-free emission with frequency dependence 
as in eq. (34). Dotted line shows the result if free-free emission is subtracted off. Dot-dashed line shows the results 
if synchrotron and free-free are subtracted off. 

The number of independent measurements N = 28, one for each patch and channel. Until now we 
have been implicitly assuming a delta function for the foreground prior: p$ oc S(J~), no foreground. 
In principle, T is a set of 28 numbers, one for each patch and channel. Let us first suppose that 
there is only one component to the foreground, free-free emission, say. Then, we expect the signal 
in each channel to scale as 

F a oc I/" 2 " 1 (34) 

where v a is the frequency associated with the a th channel. The distribution function now becomes 

1 



ps(v\Q 



rms— ps 



) = 



x P6 ({^0)exp j-l(A ((M) -^F a )C ( -i )(6|i) (A (w) -^F fc )}. (35) 
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If the foreground contribution is known in one of the channels, eq. (34) allows us to determine 
the contribution in all channels. Thus, instead of 28 free parameters, we have added only seven. 
Other maps of the region, radio maps or infrared maps, might give a reasonable prior p$, so 
that the integral in eq. (35) could be done and translated into a distribution for (5 rms -p S - In 
the absence of these, we will use a uniform prior for the A i: which we've argued is equivalent to 
considering only those linear combinations of the temperatures which are independent of the free- 
free emission. Before we do this, though, it is interesting to consider another possible approach 
to the integral in eq. (35). Besides the prior, the only dependence on the free-free emission comes 
in the quadratic polynomial in the exponential. One could easily minimize the argument of the 
exponential, thereby maximizing the distribution. If the integrand is sharply peaked, then this 
"best fit foreground" should be a good approximation to the distribution. Figure 4 shows the 
likelihood function obtained in this way. The best fit free-free leaves little room for anything else! 
The lesson to be learned from this example is not that a very stringent upper limit has been set. 
Rather, it is important to note the difference that foreground sources can make. The likelihood 
for Qrms-ps with best fit foreground differs significantly from the likelihood with no foreground. 
Since the foreground can make a world of difference and since we really don't have much prior 
information about the foreground, one reasonable approach is to keep only the parts of the signal 
that are independent of foreground, i.e. to marginalize. 

We subtract off components dependent on foreground by solving eq. (12); we'll do this first 
assuming a foreground spectrum as in eq. (34). The four frequency channels in the SP91 experiment 
are centered at v = 26.25, 28.75, 31.25, 33.75 GHz. Therefore, we must solve 

4 / 26 25 \ 2,1 

26.25+ (a- 1)2.5 J = °- (36) 

There are three independent solutions to this equation. So the four temperatures in the four 
channels have been reduced to three temperatures -linear combinations of the four channels: 



A ind 


= -2.5A (1 , 


+ 1.3A {2 , + 1.2A (3 , + 1.0A (4ii) 


A ind 


=0.0A (M) - 


- 6.7A (2 , + 9.6A (3ii) - 1.9A (4ii) 


a ind 

^(3,0 


=0.0A (M) - 


- 2.2A (2 , -0.7A (3 , o + 3.9A (4 , . 



The index % here still labels patches, as we perform this transformation in each patch. The normal- 
ization is arbitrary; we have chosen it so that for MBR, with its flat spectrum, A md = A. From 
here on in, the construction of the likelihood function mimics the process we went through with 
mean-subtraction. The main step is to construct the reduced correlation matrix as in eq. (14). 
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Figure 4 shows the likelihood function thus obtained. Perhaps the most surprising feature of the 
marginalized likelihood function is that it leads to a very weak upper limit: 

«r p s mit = 52^K. (38) 

This is a bit surprising because one might have reasoned as follows: Perhaps the lower limit coming 
from the four channel data with no foreground should be ignored because part of the signal could 
have come from something else. But at least the upper limit should be reliable even if foreground 
is accounted for. We see now that this reasoning is wrong! Not only can foreground confuse us 
into thinking there is a cosmic signal when in reality none exists, but also foreground can take 
away part of a cosmic signal. Foreground can be negative when the cosmic signal is positive, so by 
ignoring foreground, one can underestimate the cosmic signal, and it is this possibility that leads 
to the weak upper limit. Some may find the weak upper limit of eq. (38) overly conservative since 
they would consider the possibility of the foreground emission decreasing the variance in channel 
4 to be negligible. We will argue against this point of view in below in §5. 

Several Sources of Foreground 

Until now we have allowed only one source of foreground. What happens if we allow several 
sources? The answer is disquieting. We now show that with more than one source of foreground 
and no prior information about the amplitudes, no interesting limits can be placed on the parameter 

rms-ps- 

If we subtract a synchrotron component [assuming F oc z/ -2 7 ] in addition to the free-free 
component, the two independent temperatures in each patch are 

A%% =11.3A (1 , - 21.8A (2ii) - 0.8A (3 , o + 12.3A (4 , 

A™% =0.0A (lii) + 21.5A (2ii) - 54.7A (3 , + 34.2A (4 , . (39) 

If three components - free-free, synchrotron, and dust [F oc z/ 16 ] - then the one independent 
component in each patch is 

A^J, = 116A (M) - 420A (2iO + 494A (3 , i) - 189A (4 , . (40) 

Recall that A md is the sum of the cosmic component (T md ) and the detector noise (iV md ). As 
we've said, the cosmic component contributes equally in each channel so in a given patch T md = T. 
However, the detector noise is completely uncorrelated in the different channels, so the large 
coefficients in Eqs. (39) and (40) tells us that the detector contribution to A md will be enormous. 
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The cosmic signal will be dwarfed by the detector noise. As an example of this, Fig. 4 shows the 
likelihood function when both synchrotron and free-free are subtracted off. The likelihood function 
is extremely flat, and the upper limit is not very useful: Q^ s - P s = 200^K. 

These conclusions are not particular to SP91. For all the experiments we will analyze, 
subtracting off more than one source of contamination results in data swamped by noise and 
therefore useless for setting limits. This may be due to the small number of channels [three 
for MAX- ACME and four for the South Pole scans]; perhaps more channels would improve the 
situation. And of course we have not used any information from other maps which might further 
constrain the sources. 

Perhaps subtracting off one source is enough though. One of the main incentives for consid- 
ering foreground when analyzing the SP91 results was the large x 2 f° r the best fit Qrms-ps using 
the four channel data [x 2 = 46 for 27 dof ] . When only one source of foreground is included x 2 = 28 
for 20 dof, a significantly better fit. In all of the experiments we analyze here, subtracting off one 
foreground component makes the best fit Q rms - P s a better fit. 



4. Other Experiments 



In this section we apply the method developed and explained in the previous two sections 
to three other recent experiments. Along with the nine point scan of the South Pole experiment, 
there was a 13— point scan (Schuster, et al. 1992) which probed a nearby region of the sky. At 
larger frequencies, the Millimeter Anisotropy Experiment (MAX) recently reported results of two 
scans: one around the region of \x— Pegasus and the other near the star Gamma Ursa Minoris 
(GUM). 
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FIG. 5. The likelihood functions for the SP 13 point scan and two MAX scans. The solid line labeled jJ— Peg 
is the likelihood for jl— Peg when warm dust is marginalized; the bottom-most dashed line is for jl— Peg when 
both warm dust and free-free [with T OC V~ 2,1 J are marginalized. The solid line labeled SP13 is for the raw data 
presented in the thirteen point scan of the South Pole experiment; the dashed line which is broader but peaks in 
roughly the same place is for the same experiment with free-free marginalized. The solid curve for the GUM scan 
is the likelihood using the raw data; the nearby dashed curve shows the likelihood when cold dust is marginalized. 
The heavy solid line is the product of the likelihood functions for the four experiments, each with one foreground 
component marginalized. 
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Experiment 


Foreground 


<2rms//UK 


X 2 /d.o.f. 


P(x 2 ) 


SP 9-point scan 


None 




46/27 


.012 




Free-Free 


22±?5 


28/20 


.12 




Free-Free + Syncrotron 


< 184 


18/13 


.16 


SP 13-point scan 


None 




58/43 


.06 




Free-Free 


utlt 


40/32 


.17 


MAX: /U-Peg scan 


Warm Dust 




35/41 


.73 




Warm Dust + Free-Free 


< 86 


19/20 


.52 


MAX: GUM scan 


None 


29^ 4 


149/113 


.02 




Cold Dust 


29±l 6 


98/75 


.05 


All Four Experiments 


One Component 


18±f 


208/168 


.02 



Table 1. Results of Bayesian analyses of four experiments. Qrms— ps lists the best fit Qrms— ps with the error bars 

indicating the 95% upper and lower limits using Bayesian analysis with a uniform prior for Q TU1S — ps - The column 

2 / 2 s \ 2 

headed \ /dof lists the \ of the best fit Vrms— ps- The last column tabulates the probability of getting a X as 

large as or larger than this for the stated number of degrees of freedom. 

Figure 5 shows the results of calculating the likelihood function for these experiments with 
and without marginalization. Table 1 gives quantitative upper and lower limits and presents the 
X 2 for the best fit value of Q Tms -p S - Also tabulated is the probability of getting a % 2 /dof this large, 
which we'll take as a measure of goodness-of-fit. 

The raw Schuster data indicates a detection, but as Table 1 shows, the best fit for this data 
is quite poor if we assume it is all cosmic background. Specifically, the probability of getting a 
X 2 /dof= 58/43 or larger is only .06. The situation improves somewhat if one foreground component 
is subtracted off. In this case, though, the detection becomes less significant: the likelihood function 
at <5rms-ps = is still 60% of its maximum at Qrms-ps = 12//K. 

The MAX experiment has a slightly different chopping procedure than the South Pole ex- 
periments, so Eq. (24) is replaced by 

w /3.34\ , ,,,..^21 16yr V" f sm(m(j)^i/2)\ /sin(m0* 5j /2)\ 

x cos(m(0i - (f>j)) Ji(m(f>A,i) ^i W>aj)^4(^ O)Y /m (0 2j , 0). (41) 
Here J\ is the Bessel function of order one. For both scans the beam smearing angle 6 S = 0.425 x 



21 



0.5°, and the chop amplitude was 4>a sin(0 z ) = 0.65°. The //—Peg scan was taken at constant 
9 Z = 65.45°, with an angle of 0* = 0.285°/ sin(6> 2 ) between the center of each spatial patch. The 
GUM scan was binned so that 0* = 1.125°. The scan around GUM took data at several strips; 
that is, 9 Z was not constant. We account for this by allowing each patch to have 6> 2j i, a polar angle 
which varies from row to row [there are four rows in all]. Eq. (41) is based on the assumption 
that the scan took place at constant 6> 2 , whereas the scans really took place in a "bowtie" pattern. 
However, if the secondary chop was rapid enough, the constant azimuth approximation should be 
a good one. And indeed our results for the raw data alone agree with other preliminary analyses 
(Bond 1993; Srednicki, it et al. 1993), the latter of which did account for the bowtie pattern. The 
prefactor in Eq. (41) is explained in Srednicki et al. (1993); it normalizes the signal so that the 
MAX filter really would report 7\ — T 2 if it scanned across a region with two different temperatures. 

The //—Peg data correlates very well with dust from the IRAS catalogue. The MAX team 
used two methods to extract information about the MBR from this data. First, the IRAS dust 
was directly subtracted, and residuals analyzed as MBR; second, the full data was simultaneously 
fit for dust and MBR. Subsequent analyses have used the residual data set. Both of these methods 
give similar answers. It is worth pointing out though that marginalizing with respect to dust [as we 
do in Figure 5] is yet another method: instead of subtracting off a best fit dust component [as the 
simultaneous fit does] marginalizing integrates over all possible dust contributions, weighting each 
by the internal data. This third method, represented by the solid line in Figure 5, gives results in 
striking agreement with the other two. The dotted line in Figure 5 [the most horizontal one] shows 
what happens if two components are subtracted off. Again, little information can be gleaned; the 
upper limit is 86//K. 

The likelihood for the GUM scan is also shown in Figure 5. The solid line is the likelihood 
for the raw data, which leads to a high normalization [recall that COBE's normalization is now 
Qrms-ps = 17 ± 3/iK]. Table 1 shows that this best fit is not a particularly good fit; however 
marginalizing with respect to cold dust leads to little change in either the best fit or the goodness 
of fit. 

Finally the heavy solid line in Figure 5 shows the combined likelihood function of all four 
experiments we have analyzed. Here we have simply multiplied the four likelihoods together, 
in each case choosing the one-foreground-component-subtracted data. The normalization agrees 
eerily well with that of COBE, but this is not necessarily the number to focus on. Instead, we note 
that the likelihood function thus obtained is quite narrow; Table 1 shows that Q vrns -ps = 18 jlf yu-fC, 
so the error bars are quite small. This is encouraging, because a valid criticism of this work would 
be that we have been too liberal in our assumption about the amplitude of the foreground. The 
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small error bars show that we have not thrown out too much information. This best fit is also 
a poor fit: the probability of getting a x 2 /dof= 208/168 or larger is only 0.02 reflecting the fact 
that all of the individual experiments except for the \x— Peg scan exhibit relatively poor fits with 
large x 2, s. The generally poor fit might be simply telling us that the theory we have chosen is 
not the correct one. A theory with more power on small scales would help fit the GUM results 
better. Nonetheless, the fact that the two MAX scans sample the anisotropy at the same angular 
scales and find very different results suggests that it will not be easy to fit the data by simply 
adjusting the angular power spectrum. The probable interpretation is that foreground has still not 
been adequately removed from the data. Perhaps the foreground has a very different spectrum 
than has been assumed here, say self-absorbed synchrotron or very cold dust. Of course we may 
have assumed not only the wrong power spectrum for the MBR but also the wrong statistical 
distribution. If the primordial anisotropy field is not Gaussian then one might expect a greater 
probability of having very different levels of anisotropies in one part of the sky than in another. 
This might explain the difference between the \x— Peg and the GUM results. Here it should be 
remembered that our analysis of the different experiments includes the uncertainty due to finite 
sampling. One cannot ascribe the apparent inconsistency between GUM and /U-Peg under the 
CDM model to finite sampling. 



5. Are Our Limits Too Conservative? 



In the preceding discussion we have taken into account the possibility of foreground emission 
by throwing away those components of measurements which might be effected by the sources of 
foreground we are considering. This process of marginalization makes no attempt whatsoever 
to determine the amount of foreground contamination from the microwave data themselves. As 
illustrated by both fig. 3 and the table, this procedure can indeed lead to very weak limits on 
the parameters characterizing the primordial anisotropy. Is throwing out all of this data really 
justified? 

Clearly the limits produced by marginalization are conservative ones, as it allows for fore- 
ground contamination with an amplitude so large that it is unlikely to have produced the data 
that is being analyzed. One could reasonably try to limit the foreground contamination by insist- 
ing on goodness-of-fit to the microwave data and then using the constraints on the foreground to 
construct a prior for the foreground emission, i.e. of eq. (3). The marginalization procedure 
is simpler and more straightforward in that no spatial correlation for the foreground emission has 
to be constructed. In many cases marginalization will not lead to much less stringent limits than 
those obtained by the Bayesian approach just described. The utility of marginalization is best 
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considered in light of the commonly used alternative procedures: 

1) analyze ignoring the possibility of foreground contamination, 

2) analyze ignoring foreground after removing the best fit foreground, or 

3) analyze after culling of data points which appear to be contaminated. 

All of these procedures we would consider to be dangerous as we will now explain. 

A good guiding principle in dealing with foreground is that one treat the detector noise and 
"foreground noise" (i.e. contamination) on an equal footing. If we have no good information about 
the foreground contamination from radio or IR measurements the only real differences between 
the two is that one has a good idea of the characteristics of the former and very little idea of the 
characteristics of the latter. In the MBR community it has been generally accepted that one should 
not use data whose amplitude is less than the sensitivity (i.e. noise level) to put limits on the MBR 
anisotropy below the sensitivity of the detector. We would like to generalize this to say that one 
shouldn't set upper limits on primordial anisotropy below the noise level, and one must include in 
this noise the uncertain contribution from foreground contamination. If one has low detector noise 
and good spectral coverage one can determine the foreground contribution accurately, and thus 
there is little foreground uncertainty or noise. However if one cannot determine the foreground 
contribution accurately then the foreground noise is large and one should account for this in setting 
limits. This philosophy then leads us to reject both alternatives 1) and 2), which take no account 
of the added uncertainties due to foreground. Alternative 2) is especially dangerous since one not 
only underestimates the uncertainties but if the uncertainties in the foreground are large, one may 
also subtract away much of the MBR signal. This is illustrated for the SP91 experiment in fig. 3. 
The small spectral coverage of the experiment allows for very little discrimination between free- free 
emission and primordial anisotropy, and subtracting the best fit free-free signal therefore will lead 
to the subtraction of much of any primordial anisotropy present. 

Of course, the problem with alternative 3) is determining which points are likely to be 
contaminated in a way which is unrelated to the amplitude of the primordial anisotropy at those 
points. If ones criterion for culling a data point has anything to do with the amplitude of the 
signal obtained at that data point, then one runs the risk of biasing the limits on primordial 
anisotropy. One proper way of culling data from multi-frequency experiments is to drop all of the 
data in patches where the signals which are linearly independent of primordial anisotropies are 
particularly high. Such linear combinations are found by solving equations analogous to eq. (12). 

In many analyses of the SP91 experiment, only the 4th channel is used, essentially culling 
data from channels 1-3. One would hope that the rationale for this comes from the a priori 
knowledge that experiments in this frequency range are most likely to be contaminated by free-free 
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and synchrotron emission and therefore channel 4 will be the least contaminated. Unfortunately 
the signal in channels 1-3 is significantly greater than that in the 4th channel, which essentially 
meant that the data that had significant signal was dropped and that which had none was kept. 
Since the channels are so close in frequency the signals in channels 1-3 give us a strong indication 
that whatever is causing this signal, and it might be largely primordial anisotropy, should also be 
present in channel 4 since only an implausibly steep spectrum would give a negligible contribution 
to this highest frequency channel. We would therefore argue that the uncertainty in the foreground 
contamination of channel 4 is quite large and therefore any small upper limit on Q rm s-ps derived 
from analyzing channel 4 while ignoring this foreground noise is unreliable. 

One might argue that upper limits such as that of eq. (38) are too weak since the limit is 
weaker than that obtained by considering only channel 4 and ignoring foreground contamination. 
The rationale for this argument is that it is much more probable that the foreground will increase 
the observed signal rather than decrease it. We do not find these arguments very convincing. 
After all, given only 7 degrees of freedom the probability that x 2 is less than half its expected 
value is 0.165, not a very small number. It is interesting to note that if, say, half the signal comes 
from primordial anisotropy and half the signal comes from foreground contamination, then a low 
X 2 is much more likely to be produced by a cancellation of the foreground and the primordial 
anisotropy than by having the foreground and primordial contribution both be small. This is 
easily understood in terms of phase space arguments as follows. Consider two random variables 
X and Y both uniformly distributed in [—1,1] (see Figure 6). Thus the probability distribution 
in the X-Y plane is uniform in a square. The locus of possible outcomes which have a given value 
of X + Y are just lines that cut through the square at a —45°, and if this sum is much smaller 
than 1 this line passes close to the origin, i.e. close to the diagonal X = —Y. Most of the length 
of such a diagonal line is not located near the origin where both X and Y are small but is rather 
closer to the corners where X, Y ~ 1 and there is strong cancellation. Thus if X + Y is small 
it is more likely that X « — Y rather than both X and Y being small. The same arguments 
works when a Gaussian distribution replaces a uniform distribution. To reiterate: cancellation is 
likely where the total signal is low even though it is unlikely in general. One may apply this to 
channel 4 of SP91 where we think there is evidence that the signal is low. If both foreground and 
primordial anisotropy are contributing to channels 1-3 we shouldn't be surprised to find significant 
cancellation between primordial and foreground contamination in channel 4. 

In our analysis we have not made use of any other data outside of the microwave anisotropy 
data to limit the amount of foreground contamination. One should be careful with such procedures 
as they often require bold extrapolations of the spectrum. In contrast, to marginalize we require 
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X 

FIG. 6. A plane in which X and Y are randomly distributed. Even if data constrains \X + Y\ < .05 [the shaded 
region in the figure], neither X nor Y is necessarily small. In fact, in much of the shaded region, both \X\ and \Y\ 
are of order one. 

only that the assumed form of the spectrum holds over a relatively narrow frequency region [i.e. 
there is a big difference between extrapolating a z/ -3 spectrum from 408 MHz to 40 GHz and 
extrapolating the same v~ z spectrum between 25 and 35 GHz]. While there is often reliable 
physics behind the extrapolations there are usually some caveats behind the applicability of these 
extrapolations. For example free-free or synchrotron self-absorption may severely decrease emission 
in the radio of sources which are bright in the microwave. Of course, if we have no idea about the 
spectra of the foreground we cannot marginalize the signal from these sources. Let us hope that 
sources with such problematic spectra do not exist in any abundance. In spite of the caveats we 
do take seriously these measurements. In many cases the data indicates that there should be no 
significant contamination, and in these cases marginalization may yield overly broad upper and 
lower limits. 
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In the future what must be hoped for is better data with low enough noise in enough different 
frequency channels to be able to fit all of the components to microwave brightness fluctuations. 
We would like to point out that the null space analysis of eq. (12) can be useful in choosing 
which channels to use for such experiments. Many of the very large upper limits in the Table are 
simply a result of the coefficients obtained by solving eq. (12). In other words, for some choices of 
channel frequencies the signal-to-noise is greatly reduced by projecting out the foregrounds while 
for other choices the reduction is less severe. By choosing the right frequencies one can optimize the 
signal-to-noise which is left after marginalization. Of course, instrumental considerations as well 
as considerations of atmospheric emission must also play a strong role in the choices of frequencies. 

This work was supported in part by the DOE and NASA grant NAGW-2381 at Fermilab. 
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